(i) List of contents

This data repository contains three folders: (1) code, (2) data, and (3) output. 

The code folder contains five R scripts: (1) 1_replication_topic_model_preprocessing, (2) 2_replication_topic_model, (3) 3_replication_survey_preprocessing, (4) 4_replication_analysis, and (5) 5_replication_appendix. Run these in the order in which they are numbered. The first script takes raw text transcripts and preprocesses them for running a structural topic model. This script saves text_preprocessed.rds into the data/processed folder. The second code runs a structural topic model on the preprocessed text data. This script saves topic_model.rds into the data/processed folder. The third code loads in raw survey data and county-level covariates and merges in the topic model produced in the second script. This script saves master_data.rds into the data/processed folder. The fourth and fifth scripts replicate the findings in the main paper and the appendix, respectively.

The data folder contains three subfolders: (1) transcripts, (2) raw, and (3) processed. Transcripts contains the raw text transcripts from Factiva for input into 1_replication_topic_model_preprocessing. Raw contains the survey data and all covariates for input into 3_replication_survey_preprocessing. And processed contains master_data.rds, text_preprocessed.rds, and topic_model.rds for anyone who wishes to skip scripts 1-3 and go directly to replicating the results. 

The output folder collects 53 output files produced by the five scripts. 


(ii) Computation requirements

All code to replicate this paper and its supplementary appendix was run with R version 4.4.0 (2024-04-24). Some of the more computationally intensive code in 5_replication_appendix was run using parallelization which utilizes all but one of the hosting computer's cores. This significantly speeds up the process but renders your computer effectively unusable for other functions while parallelizing. The parallelization was written and run on a PC (Lenovo evo i7 with 12 cores and 16 logical processors), though it should run on Macs as well. 

At the top of each script is written the amount of time it takes to run each script, though this will vary based on computing power and whether you remove parallelization. The scripts, respectively, take approximately 1.75 hours, 3.75 hours, 1 minute, 5 minutes, and 30 hours to run. The 5_replication_appendix script can be run in a few minutes if you skip Appendix A.9 and Appendix A.17.

R libraries used include stringi, tidyverse, stm (version 1.3.6), tm (version 0.7-8), ggplot2, geometry, Rtsne, rsvd, tidystm, dplyr, readxl, zoo, survey, stargazer, sjPlot, ggpubr, estimatr, interflex, specr, parallel, MASS, car, effects, lmtest, sandwich, and texreg. In 2_replication_topic_model, it is necessary to set the environment variable that makes a strict check of version numbers a warning instead of an error because R 4.4 introduced a stricter numeric_version() check that causes trouble with version 0.7-8 of tm. This is accomplished with the line of code Sys.setenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_" = "false") in line 73. I use older versions of stm and tm to match the versions at the time of the original analysis. 


(iii) Instructions for data preparation and analysis

Download and save the data review folder maintaining the file structure. Each script at the top sets the working directory to the source file location and loads data and saves output relative to this location. Changing the folder structure will cause this code to return an error. 

The five scripts should be run in the order in which they are numbered. If you wish to skip making the topic model and go straight to processing the survey, begin at script 3 and use the provided topic_model.rds file in data/processed. If you wish to skip prepping the survey data and go straight to replicating the main results and appendices, begin at scripts 4 and 5, respectively, and use the provided master_data.rds file in data/processed. Please note that these earlier scripts output some of the in-text numbers and figures in the main analyses, so these will not be created if you start at script 3. But the main results (Tables 2 and 3 and Figures 3-5) will be created. See Section V for more details. 

##### A note on proprietary data #####

Several pieces of data are proprietary and are not be made available in the repository. These include: (1) data on hospital beds per facility from the American Hospital Association, (2) the raw transcripts from Factiva, and (3) the replication file from Neumann et al. (2024). Proprietary data are marked with ** in Section iv. In practice, that means that individuals wishing to replicate the results must use the processed RDA files included in data/processed to run scripts 4 and 5. 


(iv) Description of data sources

(1) YouGov survey data (data/raw)

file name: covid_19_economist_stacked_replication.csv
source: YouGov/Economist
description: individual-level survey data collected by YouGov on behalf of the Economist
note: for accompanying codebook, see Survey Data Codebook.txt in the data/raw folder.


(2) HUD-USPS ZIP code-county crosswalk (data/raw)

file name: ZIP_COUNTY_122020.csv
source: Download at https://www.huduser.gov/apps/public/uspscrosswalk/home
description: Crosswalk for mapping ZIP codes to counties
note: You need to register in order to download. Download the Q4 2020 ZIP code to County file.


(3) Daily national COVID-19 deaths (data/raw)

file name: data_table_for_total_deaths__united_states.csv
source: CDC. Download at: https://covid.cdc.gov/covid-data-tracker/#trends_select_select_00
description: Total number of cumulative nationwide COVID-19 deaths by day
note: The data appear to have been removed as of September 3, 2025


(4) Daily county-level COVID-19 cases and deaths (data/raw)

file names: time_series_covid19_confirmed_US.csv (cases) and time_series_covid19_deaths_US.csv (deaths)
source: Johns Hopkins Center for System Science and Engineering. Download at: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
description: Total number of daily COVID-19 cases and deaths per county


(5) American Hospital Association data on hospital beds (data/raw)**

file name: American_Hospital_Association_2019.csv
source: American Hospital Association
description: number of hospital beds per facility
note: **This data is proprietary and is not included in the repository


(6) Media transcripts (data/transcripts)**

file names: USA Today.csv, ABC World News Tonight.csv, All In with Chris Hayes.csv, Hannity.csv, PoliticsNation.csv, The Five.csv, The Ingraham Angle.csv, The Last Word with Lawrence ODonnell.csv, The Rachel Maddow Show.csv, Tucker Carlson Tonight.csv
source: Factiva. Must have an account to download. Download all broadcast transcripts and all front-page articles from USA Today appearing in 2020.
description: raw text from USA Today, ABC World News Tonight, and four most-popular Fox News and MSNBC programs
note: **This data is proprietary and is not included in the repository


(7) County-level educational attainment (data/raw)

file name: Education.csv
source: USDA Economic Research Service. Data are from the 2015-2019 American Community Survey 5-year estimates, accessed via U.S. Department of Agriculture Economic Research Service. Download at: https://ers.usda.gov/data-products/county-level-data-sets/county-level-data-sets-download-data
description: county-level educational attainment


(8) County-level Black and Hispanic population (data/raw)

file name: county_race.csv
source: Data are from the 2015-2019 American Community Survey 5-year estimates
description: county-level proportion of Black and Hispanic population


(9) County-level median household income (data/raw)

file name: est19all.csv
source: Data are estimated for 2019 via U.S. Census Bureau Small Area Income and Poverty Estimates Program. Download at: https://www2.census.gov/programs-surveys/saipe/datasets/2019/2019-state-and-county/est19all.xls
description: county-level median household income
note: Download XLS file; delete rows 1-3; delete columns E-V and X-AE; create new column labeled FIPS and enter =CONCAT(A#,B#) for all rows #; save as CSV


(10) County-level 2016 presidential vote share (data/raw)

file name: countypres_2016
source: Data are from MIT Election Data and Science Lab. Download at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
description: county-level 2016 presidential vote share


(11) County-level unemployment and rural-urban continuum (data/raw)

file name: Unemployment.xlsx
source: USDA Economic Research Service. Download at: https://ers.usda.gov/sites/default/files/_laserfiche/DataFiles/48747/Unemployment2023.xlsx?v=58112
description: county-level unemployment and rural-urban continuum.
note: Download XLSX file; delete rows 1-4 of first tab
	

(12) Hand-coding of Fox News and MSNBC transcripts (data/raw)

file name: hand_coding.csv
source: Authors and undergraduate research assistants
description: results of hand-coding of Fox News and MSNBC transcripts


(13) Neumann et al. (2024) replication data (data/raw)**

file name: mask_stories_per_day.rdata
source: Neumann, Markus et al. (2024). “Politicizing Masks? Examining the Volume and Content of Local News Coverage of Face Coverings in the U.S. Through the COVID-19 Pandemic”. In: Political Communication 41 (1), pp. 66–106. doi: 10.1080/10584609.2023.2239181.
description: replication data from Neumann et al. (2024)
note: **This data is not included in the repository, but is available upon reasonable request from the original authors. 


(v) Description of output

Output files include figures, tables, and in-text numbers for the main paper and the supplementary appendix. In total, the scripts produce 53 output files, listed below with a brief description where applicable and indicating the script that produces the file. 

Appendix A.1.txt -- in-text number for Appendix A.1 (5_replication_appendix)
Appendix A.3.txt -- in-text number for Appendix A.3 (5_replication_appendix)
Appendix A.5.csv -- in-text word lists for Appendix A.5 (2_replication_topic_model)
Appendix A.6.txt -- in-text number for Appendix A.6 (5_replication_appendix)
Appendix A.9 spec curve logged cases.txt -- in-text number for Appendix A.9 (5_replication_appendix)
Appendix A.9 spec curve logged deaths.txt -- in-text number for Appendix A.9 (5_replication_appendix)
Appendix A.9.txt -- in-text number for Appendix A.9 (5_replication_appendix)
Appendix A.11.txt -- in-text number for Appendix A.11 (5_replication_appendix)
Appendix A.12.txt -- in-text number for Appendix A.12 (3_replication_survey_preprocessing)
Appendix A.13.txt -- in-text number for Appendix A.13 (5_replication_appendix)
Appendix A.15.txt -- in-text number for Appendix A.15 (5_replication_appendix)
Appendix A.17.txt -- in-text number for Appendix A.17 (5_replication_appendix)
Figure 1.png (3_replication_survey_preprocessing)
Figure 2.png (2_replication_topic_model)
Figure 3.png (4_replication_analysis)
Figure 4.png (4_replication_analysis)
Figure 5.png (4_replication_analysis)
Figure A.1.png (3_replication_survey_preprocessing)
Figure A.2.png (2_replication_topic_model)
Figure A.3.png (2_replication_topic_model)
Figure A.4.png (2_replication_topic_model)
Figure A.5.png (5_replication_appendix)
Figure A.6.png (5_replication_appendix)
Figure A.7.png (5_replication_appendix)
Figure A.8.png (5_replication_appendix)
Figure A.9.png (5_replication_appendix)
Figure A.10.png (3_replication_survey_preprocessing)
Figure A.11.png (5_replication_appendix)
Figure A.12.png (5_replication_appendix)
Figure A.13.pdf (4_replication_analysis)
Figure A.14.png (5_replication_appendix)
Figure A.15.png (5_replication_appendix)
footnote_3.txt -- in-text number for footnote 3 in main text (3_replication_survey_preprocessing)
footnote_8.txt -- in-text number for footnote 8 in main text (4_replication_analysis)
footnote_9.txt -- in-text number for footnote 9 in main text (2_replication_topic_model)
footnote_11.txt -- in-text number for footnote 11 in main text (4_replication_analysis)
footnote_12.txt -- in-text number for footnote 12 in main text (3_replication_survey_preprocessing)
pg1_no_counties.txt -- in-text number on page 1 of main text (4_replication_analysis)
pg3_misperceptions_by_party_education.txt -- in-text number on page 3 of main text (4_replication_analysis)
pg5_corpus_keywords.txt -- in-text number on page 5 of main text (2_replication_topic_model)
pg6_news_sources.txt -- in-text number on page 6 of main text (4_replication_analysis)
pg12_correlation_masks.txt -- in-text number on page 12 of main text (5_replication_appendix)
pg12_correlation_topics.txt -- in-text number on page 12 of main text (2_replication_topic_model)
Table 2.csv (4_replication_analysis)
Table 3.csv (4_replication_analysis)
Table A.1.csv (5_replication_appendix)
Table A.2.csv (5_replication_appendix)
Table A.3.txt (5_replication_appendix)
Table A.4.txt (4_replication_analysis)
Table A.5.txt (5_replication_appendix)
Table A.6.txt (5_replication_appendix)
Table A.7.txt (5_replication_appendix)
Table A.8.txt (5_replication_appendix)